Tag

#model compression

8 articles

NVIDIA Introduces X-Token: Projection-Guided Cross-Tokenizer KD That Outperforms GOLD by +3.82 Average Points on Llama-3.2-1B

This article explains NVIDIA's X-Token, a novel knowledge distillation technique that improves the performance of smaller language models by addressing token misalignment issues in previous methods like GOLD. It details how projection-guided cross-tokenizer alignment enhances model compression and deployment efficiency.

May 2950

Understanding LLM Distillation Techniques

Learn how to implement basic LLM distillation techniques to train smaller, more efficient models that mimic larger pre-trained models.

May 1160

Qwen AI Releases Qwen-Scope: An Open-Source Sparse AutoEncoders (SAE) Suite That Turns LLM Internal Features into Practical Development Tools

This explainer explores how Qwen-Scope, an open-source suite from Alibaba's Qwen team, uses sparse autoencoders to extract and transform LLM internal features into practical development tools, advancing model interpretability and functionality.

Apr 3045

tech

Why I'm recommending last year's phones over 2026 models - with one exception

This explainer explores how AI model optimization techniques have made older smartphones more efficient than newer models, challenging the assumption that newer is always better.

Apr 1980

Researchers from MIT, NVIDIA, and Zhejiang University Propose TriAttention: A KV Cache Compression Method That Matches Full Attention at 2.5× Higher Throughput

Learn how TriAttention, a new AI method, compresses memory in large language models to make them 2.5x faster without losing accuracy.

Apr 1158

How Knowledge Distillation Compresses Ensemble Intelligence into a Single Deployable AI Model

Knowledge distillation offers a way to compress the intelligence of complex model ensembles into a single, deployable AI model, making high-performance AI practical for real-world applications.

Apr 1088

Multiverse Computing pushes its compressed AI models into the mainstream

Learn about model compression techniques that reduce the size and computational requirements of large AI models while maintaining performance, enabling broader AI deployment.

Mar 18104

OpenAI turns model compression into a talent hunt with its 16 MB "Parameter Golf" challenge

OpenAI launches a 16 MB model compression challenge to advance AI efficiency and scout for top talent.

Mar 1898